Automatic expansion of abbreviations by using context and character information

نویسندگان

  • Akira Terada
  • Takenobu Tokunaga
  • Hozumi Tanaka
چکیده

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An easily implemented method for abbreviation expansion for the medical domain in Japanese text. A preliminary study.

BACKGROUND One of the barriers for the effective use of computerized health-care related text is the ambiguity of abbreviations. To date, the task of disambiguating abbreviations has been treated as a classification task based on surrounding words. Application of this framework for languages that have no word boundaries requires pre-processing to segment a sentence into separate word sequences....

متن کامل

Using String Comparison in Context for Improved Relevance Feedback in Different Text Media

Query expansion is a long standing relevance feedback technique for improving the effectiveness of information retrieval systems. Previous investigations have shown it to be generally effective for electronic text, to give proportionally better improvement for automatic transcriptions of spoken documents, and to be at best of questionable utility for optical character recognized scanned text do...

متن کامل

Identification of the underlying factors affecting information seeking behavior of users interacting with the visual search option in EBSCO: a grounded theory study

Background and Aim: Information seeking is interactive behavior of searcher with information systems and this active interaction occurs in a real environment known as background or context. This study investigated the factors influencing the formation of layers of context and their impact on the interaction of the user with search option dialoge in EBSCO database. Method: Data from 28 semi-stru...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

A Character-Level Machine Translation Approach for Normalization of SMS Abbreviations

This paper describes a two-phase method for expanding abbreviations found in informal text (e.g., email, text messages, chat room conversations) using a machine translation system trained at the character level during the first phase. In this way, the system learns mappings between character-level “phrases” and is much more robust to new abbreviations than a word-level system. We generate trans...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2004